New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve errors location tracking #14130
Improve errors location tracking #14130
Conversation
@@ -78,7 +78,7 @@ export default class ParserError extends CommentsParser { | |||
else if (pos === this.state.lastTokStart) loc = this.state.lastTokStartLoc; | |||
else if (pos === this.state.end) loc = this.state.endLoc; | |||
else if (pos === this.state.lastTokEnd) loc = this.state.lastTokEndLoc; | |||
else loc = getLineInfo(this.input, pos); | |||
else loc = getLineInfo(this.state.startLoc.line, this.input, pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.state.startLoc
will always be updated when a new AST node is parsed. We can make getLineInfo
a parser method and then read startLine
from parser options.
export function getLineInfo(input: string, offset: number): Position { | ||
let line = 1; | ||
export function getLineInfo( | ||
startLine: number, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we support startLine
, then we should also support startColumn
.
I am looking at possibly just making the various It would also be nice if "loc" and "pos" weren't separate entities that would need to be passed around together. For example, if export class Position {
line: number;
column: number;
charIndex: number;
} But this is of course a breaking change and outside the scope of all this. |
I have a significantly different change that accounts for column, etc. too. Will push either later today or tomorrow. |
I'm going to try to get an updated PR submitted later today. Just writing a bunch of tests and stuff now. |
d251559
to
6e475c5
Compare
OK, I have added a bunch of info to the main section above explaining this change. I am now in the process of trying something with the tests that would allow us to try every existing test with different startLine/startColumn changes and thus avoid having to figure out the right places to test this. |
raise( | ||
pos: number, | ||
{ code, reasonCode, template }: ErrorTemplate, | ||
origin: Origin, | ||
...params: any | ||
): Error | empty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid having to always specify a new object when calling .raise
, maybe we could divide it in two separate functions:
raise(pos: Position, { code, reasonCode, template }: ErrorTemplate, ...params: any): Error | empty {
return this.raiseWithData(pos, { code, reasonCode }, template, ...params);
}
raiseAtNode(node: Node, { code, reasonCode, template }: ErrorTemplate, ...params: any): Error | empty {
return this.raiseWithData(origin.node.loc.start, { code, reasonCode }, template, ...params);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually originally had this, but decided against it for a couple reasons:
- I wanted it to read like throwing an error "throw SomeError(metadata)", which calls out what matters most at that line "hey, this is a MissingSemicolonError" or "hey, this is RestTrailingCommaError" vs. front-loading the information that matters least, the associated metadata, when reading the code, like doing throw "metadata on SomeError";
- I think we will potentially want to add more to this object later. As I mentioned in the above, we may eventually want to expand this to
{ at: length: }
for ranges (or{ start, end }
or however you want to do that), and I think it would also eventually be nice to be able to supply context info (we currently have this info but it gets stringified for example, but can be very useful to the user to be able to do error.enumName vs. parsing message for enumName). As such, I wanted to put in the scaffolding for something that could carry additional info that could be easily read in the future. - From my testing there seems to be zero performance benefit to avoiding object creation at this point, for two reasons. First, it is a seldom occurring event, as it happens once per error raise (vs. getLineInfo which was an iterative event that can lead to an O(N) slowdown per Error, where N = source length). Secondly, if we did consider it to be performance sensitive, then we'd for example be much better served by instead changing
raiseWithData
to take(missingPlugin?: Array<string>, code?: string)
instead of{ missingPlugin?: Array<string>, code?: string, }
(since this objectdata
object gets thrown away immediately when yet another object is created on the corresponding call to_raise
), and this change would basically affect all error raises "for free" transparently since it's an internal function call inraise
. However, I also don't think this is a performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good 👍
// this.index = index; | ||
Object.defineProperty(this, "index", { enumerable: false, value: index }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably go in a minor, because it's a new observable property on loc
objects in the AST (making it non-enumerable hides it from the fixtures, but it's still there 😛).
However, I would like to merge this PR as soon as possible because:
- This PR also fixes different
errorRecovery
exceptions and some other bugs - This PR conflicts with virtually everything touching the parser, and rebasing it will not be easy.
For now, we can hide this index property in a new const indexes = new WeakMap()
exported by this file, and do
// this.index = index; | |
Object.defineProperty(this, "index", { enumerable: false, value: index }); | |
// this.index = index; | |
indexes.set(this, index); |
And replace /([\w\.]+)\.index/g
with indexes.get($1)
. Then, we can discuss if we want to expose it as a public property in Babel 7.17.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, can make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolo-ribaudo @tolmasky despite swap to WeakMap
just having typed member in
index: number; |
seems to actually init this field with void 0
( see https://www.runpkg.com/?@babel/parser@7.16.10/lib/index.js#9 ).
This did trip some of our tests ( TBF the test was in practice too strict and didn't allow for any uncounted for fields, so we will be relaxing them a bit ), but most importantly, I just want to check with you wether this is acceptable for you given the comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pieh, I actually reported this immediately after we shipped as I also noticed it: #14182. Apologies though, as I had the discussion with @nicolo-ribaudo offline, where he advised that it was probably better to just hash it out here: #14174, as we might just be adding them in fully in the next minor update. But I'll let @nicolo-ribaudo comment more here as I am definitely open to removing that line for the next patch update if its causing any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no problem here from my side (as I mentioned - our tests/validation were just too strict and brittle because of that - I really should just validate keys we do need (which is line
/column
, but don't enforce that they are only possible fields in Position
). This is especially true as babel
is not the only "source" we are handling errors from.
My comment was just to make sure you are aware as this sounds like something you wanted to avoid in patch release. For us it really doesn't matter (in practice, and not semantics) if it's a patch or minor given that right now we do use ^
version selector (and not pinned version or ~
for pinned minor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, and again, apologies! This one's on me, I just forgot to delete the declaration after switching the uses of it.
99fa96b
to
5b4d55d
Compare
OK, I've gone ahead and implemented the
|
const { shorthandAssign, doubleProto, optionalParameters } = | ||
): void { | ||
if (!andThrow || !refExpressionErrors) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of checkExpressionErrors
is used in parseUpdate
. I am surprised that the test does not fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this was difficult to understand, but I think this is what's going on:
- The intent, as I understand it, of
refExpressionErrors
is to "hold onto" things that would be errors in expressions, but not in patterns. - The only places patterns can appear is in variable declarations, function parameters, and assignment expressions (which are ambiguous until we reach an
=
sign). - In the first two,
refExpressionErrors
is just passed down asnull
, so it is irrelevant for our discussion here. - However, if it the current expression was determined to be an assignment expression in
parseMaybeAssign
, thenrefExpressionErrors
will have already been cleared since we know this was a pattern (refExpressionErrors.doubleProtoLoc = null; // reset because double __proto__ is valid in assignment expression - As such, I think the check in
parseUpdate
never passes, as by this pointrefExpressionErrors
has been cleared.parseUpdate
is the only place that passesfalse
forandThrow
(and also the only place that bothers checking the result).
This is why no tests fail. I've done my best to construct expressions that find some other path where this line would have refExpressionErrors
and rely on andThrow being false to avoid throwing them, but I haven't been able to. If you can confirm the above, I can try to modify the code further to reflect how it's being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to add that I think it's totally possible that there is a relevant case that's handled by this and simply not tested, and I'm happy to change this code back to that, and also add said test if we can find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I went ahead and restored the old behavior I think, although I still can't seem to figure out how it behaves differently, but figured it wasn't worth blocking this PR on figuring that out. I can file a bug on exploring this further. We should try to get a test in there or figure out why it's never entered.
@@ -40,8 +41,7 @@ export default class ScopeHandler<IScope: Scope = Scope> { | |||
scopeStack: Array<IScope> = []; | |||
declare raise: raiseFunction; | |||
declare inModule: boolean; | |||
undefinedExports: Map<string, number> = new Map(); | |||
undefinedPrivateNames: Map<string, number> = new Map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR the pos: Position
is expanded into an origin: Origin
. Personally I still prefer the argument order of previous .raise()
signature because the params
are coupled with the error template.
raise(origin: Origin, template: ErrorTemplate, ...params: any)
So, just upfront, I know this is a somewhat stylistic change so if you are not convinced by the following arguments, then I will absolutely swap the order back. That being said, I just wanted to check that you read my first response to Nicolo regarding this here, and a couple additional thoughts:
Anyways, as I stated earlier, please don't let the wall of text make you feel that am unwilling to swap it back , just wanted to share various "error thoughts" I had while making my way through this, and I am happy to swap it back if they are not convincing. |
…an correctly and efficiently track line and column offsets. Additionally: 1. Make Errors.ElementAfterRest a recoverable error. 2. Make expectContextual throw a nicer error telling you what it expected to find. Reviewed by @tolmasky.
…coverable, and that expectedContextual now throws an error message with the expected token. Also, update `packages/babel-parser/test/fixtures/es2015/uncategorised/277/input.js` since it had a different syntax error that was previously not reached since it would first fail on the fact that there was an element after rest. Reviewed by @tolmasky.
… should look into this more. Reviewed by @tolmasky.
…er to make index an actual property in the next minor release. Reviewed by @tolmasky.
5b4d55d
to
22d3112
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I noticed that there are a bunch of FIXME:
comments, but they mostly seem about "explain why it's safe to get the prev/next pos here".
// after a Hack-style pipe body is parsed. | ||
// The `startPos` is the starting position of the pipe body. | ||
|
||
checkHackPipeBodyEarlyErrors(startPos: number): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this unused code.
@@ -32,24 +35,7 @@ export class SourceLocation { | |||
} | |||
} | |||
|
|||
// The `getLineInfo` function is mostly useful when the | |||
// `locations` option is off (for performance reasons) and you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, it looks like this option doesn't exist anymore.
Summary
The main change in this PR is to remove our dependence on
getLineInfo
by instead relying more heavily on thePosition
versions of the various locations that are passed around during parsing. Aside from thus "naturally" accounting forstartLine
andstartColumn
with no additional work, this also avoids the pathalogical cases whengetLineInfo
can't rely on previously cached locations and must thus re-iterate through the entire file to re-generate line and column information, despite the fact that other parts of the program already have this information. Although this is perhaps not too terrible in the default{ errorRecovery: false }
case of callingparse
, it can be quite devastating to performance if you are running with{ errorRecovery: true }
, like we do in RunKit.Approach
Perhaps the most important change is that I have added an
index
member to thePosition
class, which tracks the source index (the value represented by, e.g.,State.[start|end]
vs.State.[start|end]Loc
):The reason for this is that these two values do exist in a one-to-one mapping, and thus anywhere where they must be passed separately simply introduces the possibility of erroneously getting them out of sync. It is also almost always the case that various methods, from an interface perspective, either care about both, or just the
Position
, and thus we are not creating too many cases where we are creating extra objects during the parse. That is to say, the places where it is beneficial to just updatestate.pos
all still behave that way since they are in tight loops, not updated and then calling into other functions (that we weren't already doing this for).I have made this change fairly defensively by making index be a non-enumerable member of
Position
:This way, I wouldn't have to re-generate every existing test to include this new addition, and it also won't show up unexpectedly if anyone is JSON-ing these trees or whatever. I am however happy to just make it a normal member variable though, if that is preferred. I just figured I'd start off doing it this way, as it is easy to change, and so that more attention can be paid to the fewer "meaningful" test changes instead of being bombarded with a PR that changes just about every test.
As such, a few key methods have changed:
raise()
Where
Origin
is defined as:Aside from the usage of
Position
instead of the previousnumber
index, the idea here is to make the most common use cases super simple, as well as needing less Flow gymnastics (and IMO making it more readable). Namely, in the very common case where you want toraise
an error "from" a node, the code looks something like this:This then does the right thing by extracting the
Position
information from the node in question. Additionally, if we in the future want to change errors to support ranges (so that you can for example underline the entire node vs. just point to the beginning), we could now much more easily do this for all these calls, since we'd only need to updateraise
to also grab theend
from the node too.If however you still want to pass a specific location for the error, you use the
at
form:This behaves much like the old version, except taking a
Position
instead of anumber
index. I believe both of these read closer to thethrow SomeError(info)
pattern they are approximating, as well as making it immediately obvious whether they are needing to do complex location stuff or just defaulting to the "standard" case of identifying a node as having an error.unexpected()
The other big error change is to
unexpected
.unexpected
used to take either an error template or a token, but now only does the token version:The reason for this change is that you really don't "get" anything from using the
ErrorTemplate
form ofunexpected
, as it collapses down to just being a call toraise
withpos
set tothis.state.start
and automatically thrown. In other words, there's no special "unexpected messaging" or anything taking place, since theErrorTemplate
circumvents all that logic, and it's basically synonymous with just doingthrow this.raise()
. Despite this, it requires a fair amount offlow
gymnastics to support these two unrelated types. As such, I replaced the few instances of this usage withthrow this.raise()
and leftunexpected()
to solely handle the token case.expectContextual()
There's no big change in
expectContextual
, except that thanks to the aboveunexpected
change, it can now throw a nicer error. Previously, something likethis.expectContextual(tt._as)
would throw an error message saying "Unexpected token", whereas now it will throw an error that says ""Unexpected token, expected as".Closes #14144
Side-effects
As a side-effect of this change, a number of function interfaces became simpler, now only requiring a location be passed in instead of the index/Position pair. We could actually do this with way more methods, but I limited it to solely those methods that were directly affected by my changes necessary for fixing errors here. This is particularly nice for methods that take optional locations, as this removes the ambiguity about whether you can pass just the integer index or if you must either pass both or neither.
Additionally,
getLineInfo
andgetLocationForPosition
have been removed, since they aren't needed anymore.More recoverable errors
I'd like to focus on this more after this PR, since errorRecovery is kind of what set us down this path, but just due to the nature of this change in particular, I was able to easily make
Errors.ElementAfterRest
a recoverable error, and the tests have been updated accordingly to reflect this.Closes #14123.